Skip to content

fix(x402): classify transaction_held as TRANSACTION_PENDING (closes #93)#94

Closed
tfireubs-ui wants to merge 1 commit intoaibtcdev:mainfrom
tfireubs-ui:fix/classify-transaction-held-93
Closed

fix(x402): classify transaction_held as TRANSACTION_PENDING (closes #93)#94
tfireubs-ui wants to merge 1 commit intoaibtcdev:mainfrom
tfireubs-ui:fix/classify-transaction-held-93

Conversation

@tfireubs-ui
Copy link
Copy Markdown
Contributor

Problem

When the relay's /settle endpoint returns errorReason: "transaction_held" (relay accepted the tx but queued it due to a sender nonce gap), classifyPaymentError() doesn't match this error reason. It falls through to the catch-all UNEXPECTED_SETTLE_ERROR — which produces a 500 with "code": "UNKNOWN_ERROR". The caller has no indication this is retryable and no idea why it failed.

Fix

Added a transaction_held case to classifyPaymentError that maps to TRANSACTION_PENDING with retryAfter: 30:

if (combined.includes("transaction_held") || combined.includes("transaction held")) {
  return { code: X402_ERROR_CODES.TRANSACTION_PENDING, message: "Transaction held in relay nonce queue, please retry", httpStatus: 402, retryAfter: 30 };
}

The 30s retryAfter is longer than transaction_pending's 10s because nonce gap resolution takes more time (relay needs to fill or clear the gap before it can dispatch the held tx).

After this fix, callers receive 402 + Retry-After: 30 with a clear message instead of 500 UNKNOWN_ERROR. Their existing retry logic (which already handles TRANSACTION_PENDING) handles this gracefully.

Closes #93

…ibtcdev#93)

The relay's /settle endpoint returns errorReason: "transaction_held" when
a sender has a nonce gap and the relay queues the tx rather than dispatching
it. This error reason was not matched by classifyPaymentError, causing it to
fall through to the catch-all UNEXPECTED_SETTLE_ERROR (500 UNKNOWN_ERROR).

Add a transaction_held case that maps to TRANSACTION_PENDING with a 30s
retryAfter — longer than the 10s for transaction_pending since resolving
a nonce gap takes more time. The client receives 402 with a clear retry
signal rather than a confusing 500.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@arc0btc arc0btc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Classifies transaction_held as TRANSACTION_PENDING with a 30s retry — exactly right.

What works well:

  • Placement in the classifier is correct. At the main call site (line 409), the function receives settleResult.errorReason as the first arg, so combined = "transaction_held transaction_held". The earlier nonce check doesn't fire ("transaction_held" contains no "nonce"), and the new case is reached cleanly.
  • The 30s retryAfter is well-calibrated. Relay nonce gap resolution is not instantaneous — we've seen gaps take 30–90s to clear in production. Callers with existing retry-on-TRANSACTION_PENDING logic get the right signal to wait longer than the normal 10s.
  • Both match forms (transaction_held and transaction held) are consistent with every other case in the file — good pattern adherence.
  • The inline comment explains the relay behavior clearly for anyone debugging a future incident.

[nit] The space-separated form "transaction held" looks defensive rather than necessary — every other relay error code uses snake_case exclusively in its errorReason. If the relay API contract guarantees snake_case, removing it keeps the check tight. But keeping it is consistent with how all other cases are written, so either way is fine.

Operational note: We process x402 payments through this relay continuously. During relay nonce gap windows (which can persist for 30–90s), held transactions were falling through to 500 UNKNOWN_ERROR — giving callers no indication the failure was transient or retryable. This fix makes those windows transparent and automatically recoverable via existing retry logic.

@tfireubs-ui
Copy link
Copy Markdown
Contributor Author

Merge ping — APPROVED (arc0btc). Classifies transaction_held as TRANSACTION_PENDING (closes #93).

@whoabuddy
Copy link
Copy Markdown
Contributor

Hi! I wanted to let you know that #107 supersedes the transaction_held classification work here. The relay now surfaces terminalReason values natively via the boring-tx state machine, so the string-matching approach in this PR is no longer needed — TERMINAL_REASON_TO_CATEGORY from @aibtc/tx-schemas handles the classification directly.

Once you've had a chance to look at the new PR and confirm, would you mind closing this one? No rush — just want to avoid confusion once the branch lands. Thanks for the work here, the problem framing was helpful context for the design.

whoabuddy added a commit that referenced this pull request Apr 23, 2026
Record PR title, body, issue comment text, and all URLs for the
boring-tx state machine PR (#107) and related issue comments on
#94, #106, and #87.

Co-Authored-By: Claude <noreply@anthropic.com>
whoabuddy added a commit that referenced this pull request Apr 29, 2026
* docs(planning): phase 1 research notes for boring-tx adoption

Map from x402-api compat-shim code paths to the native boring-tx events
emitted by x402-sponsor-relay v1.30.1. Covers shim inventory, behavior
comparison table, tx-schemas entry points, relay RPC response shapes,
PaymentPollingDO public API sketch, and risk list with 8 identified risks.

Co-Authored-By: Claude <noreply@anthropic.com>

* chore(deps): bump @aibtc/tx-schemas for boring-tx state machine

Bump @aibtc/tx-schemas constraint from ^0.3.0 to ^1.0.0. The 0.x.y semver
locking meant the installed 1.0.0 package was outside the resolvable range;
pinning to ^1.0.0 correctly tracks the native boring-tx state machine exports
needed in Phases 3-5.

Also remove stale patches/x402-stacks+2.0.1.patch — the upstream package
now ships with the desired 120000ms HTTP client timeout natively, making the
patch-package postinstall step error on every npm install.

Baseline: npm run check (tsc --noEmit) exits 0, npm run deploy:dry-run builds
clean at 1027.76 KiB before any logic changes.

Co-Authored-By: Claude <noreply@anthropic.com>

* refactor(payments): route payment types through @aibtc/tx-schemas

Create src/services/payment-contract.ts as a thin re-export helper over
@aibtc/tx-schemas subpaths, providing a single import point for all
payment-lifecycle types used across middleware, utilities, and Durable
Objects in this service.

Replace SettlementResponseV2 from x402-stacks with SettleResult (aliased
from HttpSettleResponse in tx-schemas) in middleware/x402.ts and types.ts.
The discriminated union type is structurally compatible; a cast is applied
at the verifier.settle() call site — Phase 5 removes the x402-stacks
dependency entirely when the RPC path takes over.

extractCanonicalPaymentDetails is preserved (Phase 5 removes it).

Co-Authored-By: Claude <noreply@anthropic.com>

* feat(payments): add PaymentPollingDO for checkStatusUrl polling

Adds PaymentPollingDO (SQLite-backed Durable Object) that tracks in-flight
payments by polling checkStatusUrl with exponential backoff until terminal.

Key design decisions:
- _fetchStatus() is the single relay-contact seam: Phase 4 uses HTTP GET,
  issue #87 swaps to env.X402_RELAY.checkPayment(paymentId) — one-line change
- computeDerivedHints() extracted to src/utils/payment-hints.ts so it is
  testable with bun:test without requiring cloudflare:workers runtime
- DO instances are namespaced by paymentId (one per in-flight payment)
- Alarm backoff: 5s (polls 1-3), 15s (4-6), 60s (7+), 10-min timeout

Wiring:
- wrangler.jsonc: PAYMENT_POLLING_DO binding + v3 migration tag in all envs
- src/types.ts: Env interface updated with PAYMENT_POLLING_DO binding
- src/index.ts: exports class; mounts GET /payment-status/:paymentId (free)

Unit tests (36 passing):
- computeDerivedHints covers all terminal reason categories
- Stub-based track → poll → terminal happy-path flow
- derivedHints per category verified

Co-Authored-By: Claude <noreply@anthropic.com>

* feat(payments): emit native payment.* events, drop compat shim

Middleware now mints a client-side paymentId ("pay_" + crypto.randomUUID())
before submitting to the relay, injects it as the payment-identifier
idempotency input, extracts checkStatusUrl from the relay response extensions
(with a fallback construction), and registers the payment with
PaymentPollingDO.track() as fire-and-forget.

Five native lifecycle events replace all compat-shim-era event names:
  payment.initiated  — paymentId minted, about to submit to relay
  payment.pending    — relay acknowledged but payment is still in-flight
  payment.confirmed  — relay settled successfully
  payment.failed     — relay rejected with a terminal failure reason
  payment.replaced   — payment replaced by another tx (nonce race)

Deleted entirely:
  - extractCanonicalPaymentDetails() and all internal shim helpers
  - inferLegacyStatus() and inferLegacyTerminalReason()
  - getRetryDecisionContext() (tests/_shared_utils.ts update deferred to Phase 7)
  - compat_shim_used log field from buildPaymentLogFields
  - compatShimUsed / source fields from CanonicalPaymentDetails and RetryDecisionContext
  - OpenAPI schema for details.canonical in 402 response body

Unit tests updated to cover the three surviving classifier predicates and
the revised instability derivation signature.

Co-Authored-By: Claude <noreply@anthropic.com>

* feat(payments): add retryable/retryAfter/nextSteps error hints

All non-200 payment error responses now carry structured retry hints in
both the JSON body and the payment-response header (base64 JSON):

  { retryable, nextSteps, retryAfter? }

nextSteps tokens are stable identifiers tied to tx-schemas terminal
reason categories — not free-form prose — so clients can branch without
string-parsing:
  rebuild_and_resign  — sender nonce issue, build fresh tx
  retry_later         — transient relay/settlement error
  start_new_payment   — identity lost/replaced, restart x402 flow
  fix_and_resend      — invalid payload, fix before retrying
  wait_for_confirmation — confirmed, delivery should proceed

classifyPaymentError() now checks canonical status (failed/replaced/
not_found) from the relay response before falling back to string
heuristics, so boring-tx relay responses are classified accurately.

Settlement failure path: computeDerivedHints() maps canonical status +
terminalReason → hints with no DO round-trip.
Exception path: hintsFromClassifiedCode() derives hints from classified
error code when no canonical status is available.

Updated llms-full.txt error handling section and /topics/payment-flow
topic doc to document the new hint shape, token vocabulary, and client
retry pattern.

Co-Authored-By: Claude <noreply@anthropic.com>

* test(payments): cover boring-tx lifecycle end-to-end

Finish the pending tests/_shared_utils.ts diff (NonceTracker, nonce resets
on retry, signPaymentWithNonce helper). Add getRetryDecisionContext and
RetryDecisionContext to payment-status.ts — these were already imported in
the committed shared utils but never implemented.

Add X-PAYMENT-ID response header in the middleware success path so lifecycle
tests can extract the relay paymentId from a 200 response without parsing
the settlement result's opaque extensions field.

Add tests/payment-polling-lifecycle.test.ts (runPaymentPollingLifecycle):
- Makes a real x402 payment to /hashing/sha256
- Reads X-PAYMENT-ID from the 200 response header
- Polls GET /payment-status/:paymentId (free DO route) until terminal
- Asserts the DO snapshot has expected shape (paymentId, checkStatusUrl,
  polledAt, pollCount, terminal status)

Register payment-polling in LIFECYCLE_RUNNERS in _run_all_tests.ts.

All 65 payment-*.unit.test.ts pass. npm test (quick mode, 14 stateless
endpoints) passes 14/14. npm run test:full requires X402_CLIENT_PK and
a local worker (X402_WORKER_URL=http://localhost:8787) since X-PAYMENT-ID
header is not yet in the deployed staging worker.

Co-Authored-By: Claude <noreply@anthropic.com>

* refactor(payments): simplify post-boring-tx adoption

Remove three dead imports from x402.ts middleware (isInFlightPaymentState,
isRelayRetryableTerminalReason, isSenderRebuildTerminalReason) that were added
during Phase 5 but never used in the middleware body — the retry logic that
would have used them lives in tests/_shared_utils.ts instead.

Clean up two stale phase-reference comments in payment-contract.ts that referred
to "Phase 5" as a future event ("will widen this surface", "removes the
x402-stacks dependency") — those phases are now complete.

TERMINAL_STATUSES duplication between payment-hints.ts and PaymentPollingDO.ts
was reviewed and intentionally left — the isolation benefit outweighs the DRY
concern for a 4-element constant across a DO boundary.

Co-Authored-By: Claude <noreply@anthropic.com>

* chore(planning): phase 9 verification log

All blocking checks pass: npm run check (0 errors), npm run deploy:dry-run
(clean, PAYMENT_POLLING_DO binding confirmed), npm test (14/14), bun unit
tests (114/114). Branch already on origin/main tip — no rebase needed.
Known non-blocker: test:full payment-polling-lifecycle is deployment-gated
(X-PAYMENT-ID header not yet on live staging).

Co-Authored-By: Claude <noreply@anthropic.com>

* chore(planning): phase 10 PR handoff

Record PR title, body, issue comment text, and all URLs for the
boring-tx state machine PR (#107) and related issue comments on
#94, #106, and #87.

Co-Authored-By: Claude <noreply@anthropic.com>

* chore: re-trigger Workers Builds after migration bootstrap

---------

Co-authored-by: Claude <noreply@anthropic.com>
@whoabuddy
Copy link
Copy Markdown
Contributor

Closing — superseded by #107 which addresses transaction classification as part of the boring-tx state machine. The TERMINAL_REASON_TO_CATEGORY mapping from @aibtc/tx-schemas now handles transaction_held natively, replacing the string-matching approach. Thank you for the contribution — the issue you identified was real and the classification gap is now resolved properly in #107.

@whoabuddy whoabuddy closed this Apr 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

stx402.com /registry/register returns 500 transaction_held — /settle calls failing on relay v1.27.1

3 participants